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DETAILED ACTION 
A. Drawings 

1 . The drawings are objected to under 37 CFR 1 .83(a) because they do not show the 
claimed subject matter for claims 1-8. The drawings don't illustrate the methods that are the 
claimed invention with sufficient clarity. 

a) Applicant has provided two block diagram drawings of proposed embodiments of a 
speech recognition system. However, Applicant's invention as specified in claims 1-8 
are methods and not a speech recognition system itself. Due to concurrent ambiguities in 
the language of Applicant's claims and specifications and unclear references to elements 
in the figures, a drawing that illustrates Applicant's claimed invention should show steps, 
sequence of steps or control logic illustrating any method of operating a speech 
recognition system with sufficient detail and clarity to elucidate the claimed invention. 
The subject matter of this application admits of illustration by a drawing to facilitate 
understanding of the invention. 

b) The drawings must show every feature of the invention specified in the claims. Any 
structural detail that is essential for a proper understanding of the disclosed invention 
should be shown in the drawing. MPEP § 608.02(d). The structure of Applicant's 
process is unclear in some claims. Therefore, the method steps claimed must be shown in 
more detail or the feature(s) not show must be canceled from the claim(s). 

2. The drawings are objected to under 37 CFR 1 .83(a) because they lack text labels clearly 
related to the claims and specification. 
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a) In the specification, the referencing of the description of the drawings to the elements of 
the drawings is vague and inconsistent. In Tfl[ 24 and 25, Figures 1 and 2 are described as 
first and second embodiments of speech recognition systems, respectively. Then a 
referenced description to "the microphone" and "the speech recognition system" begins 
in T[ 27 without any information in the text as to which Figure (which speech recognition 
system) is being described. 

b) Because the drawings fail to clearly show labels for the terms used to reference points in 
a speech recognition process or system, they don't aid in limiting the scope of 
Applicant's broad claim language. Given the broad definition of reception quality values, 
noise values and the interchangeability of thresholds to which they apply, the lack of 
particularity and vagueness of the drawings is unsupportive in clarifying Applicant's 
invention. 

c) It is suggested that Applicant label drawing elements using text labels that reflect the 
nomenclature of the disclosure. 

3 . Corrected drawing sheets in compliance with 37 CFR 1 . 1 2 1 (d) are required in reply to 
the Office action to avoid abandonment of the application. No new matter may be introduced in 
the required drawing. Any amended replacement drawing sheet should include all of the figures 
appearing on the immediate prior version of the sheet, even if only one figure is being amended. 
The figure or figure number of an amended drawing should not be labeled as "amended." If a 
drawing figure is to be canceled, the appropriate figure must be removed from the replacement 
sheet, and where necessary, the remaining figures must be renumbered and appropriate changes 
made to the brief description of the several views of the drawings for consistency. Additional 
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replacement sheets may be necessary to show the renumbering of the remaining figures. Each 
drawing sheet submitted after the filing date of an application must be labeled in the top margin 
as either "Replacement Sheet" or "New Sheet" pursuant to 37 CFR 1.121(d). 

4. If the changes are not accepted by the examiner, the applicant will be notified and 
informed of any required corrective action in the next Office action. The objection to the 
drawings will not be held in abeyance. 

5. Applicant is given a TWO MONTH time period to submit a drawing in compliance with 
37 CFR 1.81. Extensions of time may be obtained under the provisions of 37 CFR 1 . 1 36(a). 
Failure to timely submit a drawing will result in ABANDONMENT of the application. 

B. Specification 

6. The Specification is objected to on account of numerous language errors in the 
Background. The specification is replete with terms that are not clear, concise and exact. In 
particular, Applicant should review improper translations of qualitative or relational phrases for 
meaningfulness and technical accuracy. Correction of the numerous errors is required. See also 
37 CFR 1.71, MPEP§ 608.01. 

7. The Brief Description of the Drawings is objected to because it lacks coordinated 
references to items in the Figures. 

a) For example, in 24 and 25, Figures 1 and 2 are introduced, respectively. Then in \ 26 
Applicant states that the embodiments in both figures denote speech recognition systems 
of the "barge-in" type. However, Applicant subsequently goes on to describe, without 
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further indicating to which figure he refers, process elements that are not common to both 
figures. 

b) In Tflf 24 and 25, Figures 1 and 2 are described as first and second embodiments of speech 
recognition systems, respectively. Then a referenced description to "the microphone" 
and "the speech recognition system" begins in 27 without any information in the text as 
to which Figure (which speech recognition system) is being described. 

c) The specification must be amended to contain properly explicit references to figures and 
figure elements. All references to Figures and elements must be explicit. 37 CFR 1.74 
states: "When there are drawings, there shall be a brief description of the several views 
of the drawings and the detailed description of the invention shall refer to the different 
views by specifying the numbers of the figures, and to the different parts by use of 
reference letters or numerals (preferably the latter)." 

8. The arrangement and outline of the Specification is objected to for lacking any structure 
that would help clarify what Applicant discloses to be prior art and what he considers to be his 
inventive concept(s), distinctions that are not explicitly stated in the text of the Specification. 

a) In combination with the foregoing-mentioned language errors and vagueness of 
references to drawing elements, the lack of structure in the Background further obscures 
the line between the scope of the admitted prior art and what is the inventive concept(s). 

b) Thus, Examiner advises Applicant that the Specification reads as mostly comprising 
admissions of prior art and the Drawings and the Descriptions do not clarify otherwise. 
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c) The following guidelines illustrate the preferred layout for the specification of a utility 
application. Whether or not Applicant uses the below outline, Applicant is required to 
clearly set forth that which he considers to be his inventive concepts, as separate and 
distinct from what he considers to be prior art. The following guidelines are suggested 
for the applicant's use. 

Arrangement of the Specification 

As provided in 37 CFR 1.77(b), the specification of a utility application should 
include the following sections in order. Each of the lettered items should appear in upper 
case, without underlining or bold type, as a section heading. If no text follows the section 
heading, the phrase "Not Applicable" should follow the section heading: 

(a) TITLE OF THE INVENTION. 

(b) CROSS-REFERENCE TO RELATED APPLICATIONS. 

(c) STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR 

DEVELOPMENT. 

(d) THE NAMES OF THE PARTIES TO A JOINT RESEARCH AGREEMENT. 

(e) INCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED ON A 

COMPACT DISC. 

(f) BACKGROUND OF THE INVENTION. 

(1) Field of the Invention. 

(2) Description of Related Art including information disclosed under 37 
CFR 1.97 and 1.98. 

(g) BRIEF SUMMARY OF THE INVENTION. 

(h) BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE 

DRAWING(S). 

(i) DETAILED DESCRIPTION OF THE INVENTION. 

(j) CLAIM OR CLAIMS (commencing on a separate sheet). 

(k) ABSTRACT OF THE DISCLOSURE (commencing on a separate sheet). 

(1) SEQUENCE LISTING (See MPEP § 2424 and 37 CFR 1.821-1.825. A 
"Sequence Listing" is required on paper if the application discloses a 
nucleotide or amino acid sequence as defined in 37 CFR 1.821(a) and if 
the required "Sequence Listing" is not submitted as an electronic 
document on compact disc). 

9. It is suggested that Applicant review terms that are used in English to denote specificity, 
particularity, vagueness, relational modifiers and comparisons for which the translation may vary 
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with context and subject matter. It's also suggested that Applicant avoid mixing translated terms 
that are particular in scope with those that are general, in the same clause. Improper use of terms 
that denote scope and particularity obscures the determination of the scope of Applicant's 
statements, especially including his admissions of prior art. It is suggested that Applicant review 
the text for run-on sentences and limit the number of clauses in a sentence to two, in light of the 
moving referential bases he uses and the irregularities in the translation of phrases used to denote 
relation and degree, and arrange the specification as suggested above, if he wishes to clarify what 
he considers to be admissions of prior art versus his own inventive concepts. It is further 
suggested that Applicant indicate, for each paragraph of his detailed description, which Figure is 
relevant to the text in that paragraph and avoid referring to multiple figures in one paragraph. 

C. 35 USC §112 Paragraph 3&4- Claim Objection 

10. The following is a quotation from paragraphs 3 and 4 of 35 U.S.C. 112: 

A claim may be written in independent or, if the nature of the case admits, in dependent 
or multiple dependent form. 

Subject to the following paragraph, a claim in dependent form shall contain a reference to 
a claim previously set forth and then specify a further limitation of the subject matter 
claimed. 

1 1 . Claim 4 is objected to under 37 C.F.R. 1 .75(c) as being of improper dependent form for 
failing to further limit the subject matter of a previous claim. Claim 4, in adding to claim 1, 
merely states what was obvious in the art at the time the invention was made and fails to further 
limit the subject matter of claim 1 from which it depends. 
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a) Polikaitis, et al, teaches a speech recognition system in which incoming signal 
information is used for confidence measure hypothesis testing. Polikaitis, et al, does not 
teach that a voice activity detector provides the confidence measures. 

b) "Reception quality value" and "noise value" and the thresholding of them are broadly 
defined in the Specification, so that any signal or line noise measures, or speech 
recognition process parameter or result, intermediate or post-process, can serve as these 
values. Similarly, a "voice activity detector" can comprise any system that differentiates 
a speech signal from a non-speech signal and in its canonical form is described in the way 
Applicant describes his confidence measures hypothesis testing. 

c) The subject matter that falls within the scope of "voice activity detector" is broader than 
the limitations of claim 1 . While not all voice activity detectors are so simple as to utilize 
thresholding, such processing concept describes the canonical voice activity detector. 
"The basic function of a VAD algorithm is to extract some measured features or 
quantities from the input signal and to compare these values with thresholds, usually 
extracted from the characteristics of the noise and speech signals. Then, voice-active 
decision is made if the measured values exceed the thresholds." Tanyer, S. and Ozer, 
Hamza, "II. Voice Activity Detection, ][ 1", "Voice Activity Detection in Nonstationary 
Noise", IEEE Trans. Speech & Audio Processing, Vol. 8, No. 4. July 2000 (hereinafter 
"Tanyer and Ozer"). 

d) Claim 4 introduces no new limitations except to associate the term "voice activity 
detector" with the metrics "reception quality value" and "noise quality value" and their 
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thresholding. At most, claim 4 merely states what is obvious in the art or associates a 
term or label with a limitation in the independent claim from which it depends. 

e) A dependent claim must further limit the subject matter of the independent claim. MPEP 
§ 608.01(i)(k). Applicant is required to cancel the claim(s), or amend the claim(s) to 
place the claim(s) in proper dependent form, or rewrite the claim(s) in independent form. 

D. 35 USC §101- Claim Rejection 

12. 35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or 
composition of matter, or any new and useful improvement thereof, may obtain a patent 
therefor, subject to the conditions and requirements of this title. 

13. Claim 10 is rejected under 35 U.S.C. § 101 as it consists of program code for carrying out 
steps of methods in claim 1. The claimed invention is directed to non-statutory subject matter. 

E. 35 USC§ 112 Paragraph 1 - Claim Rejections 

14. The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner 
and process of making and using it, in such full, clear, concise, and exact terms as to 
enable any person skilled in the art to which it pertains, or with which it is most nearly 
connected, to make and use the same and shall set forth the best mode contemplated by 
the inventor of carrying out his invention. 

15. Claim 8 is rejected under 35 U.S.C. 1 12, first paragraph, as failing to comply with the 
enablement requirement. 
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a) The claim(s) contains subject matter which was not described in the specification in such 
a way as to enable one skilled in the art to which it pertains, or with which it is most 
nearly connected, to make and/or use the invention. 

b) In extending the disclosed prior art with the added limitation of sending diagnostic 
information to the user, it's unclear whether the state of the art supports the feature of 
Applicant's method claim. Applicant fails to disclose a remote diagnostic analysis in the 
audio band that would be able to detect and identify audio noise problems on the user's 
end and in the connection at least as well as the user's own ears and observations. In an 
interactive remote speech application, users typically and competently discern whether 
they can improve a poor session by speaking louder, moving to a less obstructed 
transmission/communication environment or doing something to quiet his or her 
background noise. There is no supporting technical information in the Specification that 
supports Applicant's implied device that is diagnostic of a remote user's audio 
environment. Applicant's disclosed references also do not speak to the claimed feature. 
See MPEP §2164. 

F. 35 USC§ 112 Paragraph 2 - Claim Rejections 

16. The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and 
distinctly claiming the subject matter which the applicant regards as his invention. 

17. Claims 1-10 are rejected as being indefinite and failing to particularly point out and 
distinctly claim the subject matter which applicant regards as the invention. The claims are 
indefinite due to vagueness of definition in values and thresholds, in having a broad range or 
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limitation together with a narrow range or limitation that falls within the broad range or 
limitation. Claims 2-8 and 10 are rejected in that they include claim 1 by reference and 
rejections of claim 1 based on the use of reception quality value and threshold crossings are also 
incorporated into rejections of claims 1-8 and 10. 

a) Applicant's disclosure defines "reception quality value" as being a measure of either 
signal or noise: "The reception quality value can then be determined, for example, on the 
basis of a background signal received in a speech pause of the user. This means that, for 
example, a noise level or the basic signal energy is measured at the input during the 
speech pauses so as to be used as a measure of the reception quality." (Specification, 
15). Therefore, Applicant's definition of "reception quality value" and "noise level" are 
both "reception quality measures". 

b) Applicant then states that the two value measures, "reception quality value" and "noise 
value", are interchangeable in their threshold crossings, apart from their generally being 
reciprocal: "Because the monitoring of a reception quality value in respect of this value 
dropping below a given reception quality threshold is identical, except for the use of 
reciprocal values as well as the corresponding reversal of the limit condition, to the 
monitoring of a noise value, for example, the level of a background noise signal, in 
respect of this value exceeding a given noise threshold, hereinafter the invention will be 
described in general only on the basis of the first version for the sake of simplicity, 
however, without the invention being restricted thereby in any way. The corresponding 
terms of the two versions can be interchanged at all times in the following description." 
(Background, Tf 7). Applicant should also note that there is no antecedent basis for the 
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reference to "the limit condition", meaning that there is no frame of reference whatsoever 
for the reception quality threshold and noise threshold-crossings, which Applicant defines 
as being interchangeable. 

c) Finally, Applicant goes on to broaden the definition of "reception quality value" to 
include any parameter or signal of a process in the speech recognition system: 
"Furthermore, the reception quality value can also be determined by means of the actual 
speech recognition device itself, for example, on the basis of confidence values obtained 
for the recognition results or on the basis of other parameters which are dependent, for 
example, on the quality of the recognition result or on the effort made for the 
recognition." (Background, 15). 

d) A broad range or limitation together with a narrow range or limitation that falls within the 
broad range or limitation (in the same claim) is considered indefinite, since the resulting 
claim does not clearly set forth the metes and bounds of the patent protection desired. 
See MPEP § 2173.05(c). Note the explanation given by the Board of Patent Appeals and 
Interferences in Ex parte Wu, 10 USPQ2d 2031, 2033 (Bd. Pat. App. & Inter. 1989), as to 
where broad language is followed by "such as" and then narrow language. The Board 
stated that this can render a claim indefinite by raising a question or doubt as to whether 
the feature introduced by such language is (a) merely exemplary of the remainder of the 
claim, and therefore not required, or (b) a required feature of the claims. Note also, for 
example, the decisions of Ex parte Steigewald, 131 USPQ 74 (Bd. App. 1961); Ex parte 
Hall, 83 USPQ 38 (Bd. App. 1948); and Ex parte Hasche, 86 USPQ 481 (Bd. App. 1949). 
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e) Given the foregoing, broad definitions of the value measures, the absence of any frame of 
reference for directionality in threshold crossings and the actual interchangeability of the 
values and the directions of their threshold crossings, it is practically impossible to assign 
limited meanings to Applicant's use of the following phrases in the claims: "a reception 
quality value or a noise value which represents a current reception quality", and "when 
the reception quality value drops below a given reception quality threshold or when the 
noise value exceeds a noise threshold." 

f) Accordingly, claims that refer to "reception quality" and "noise" values and "threshold", 
and any associated crossings of thresholds are taken to imply that any measure of signal 
or noise, or any parameter or set of parameters or values generated in a speech 
recognition process, that is applied to any threshold in any combination, and crosses such 
threshold in any direction, satisfies the reception quality value and threshold limitations 
in the claims. All references to "reception quality value", "noise value" and associated 
thresholds are taken to denote any confidence measures and any hypothesis testing 
thereof that is applicable to an interactive automatic speech recognition session over a 
communications link. 

18. Notice to Applicant: Applicant should take note of the interpretation of "and/or" 
language in claims. 

a) Claims 1, 5-7 and 10 contain "and/or" language. Claims 2-8 and 10 include claim 1 by 
reference and the use of "and/or" is also incorporated by reference into claims 2-8 and 10. 
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b) Any "and/or" language in the claims is interpreted as "or" because an "AND" operation 
OR'ed with an "OR" operation for the same two inputs is equivalent to an "OR" 
operation in the context that Applicant uses it. 

g) In Applicant's claims, the "or" is wholly satisfied if the speech recognition system either 
(1) "switches over to a mode of operation which is less sensitive to noise" or (2) "outputs 
an alert signal to the user". The teachings of Polikaitis, et al, are consistent with 
Applicant's claim 1 that takes option (2) (but not option (1)) and therefore the teachings 
of Polikaitis, et al, anticipate Applicant's claim 1 since option (1) isn't required. 

19. Claims 1-9 are rejected as being indefinite and failing to particularly point out and 
distinctly claim the subject matter which the applicant regards as his invention. The use of the 
word "characterized" is imprecise language that doesn't denote a required trait, just a typical or 
distinctive one. (WordNet 3.0, http://wordnet.princeton.edu). All limitations associated with the 
word "characterized" can denote features that are merely typical and not required, and do not 
comprise patentable limitations. 

20. Claim 2 is rejected because it is indefinite and failing to particularly point out and claim 
the subject matter that is an invention. 

a) The claim 2 states "A method as claimed in claim 1 , characterized in that the speech 
recognition system is automatically reset to the previous mode of operation when the 
reception quality value exceeds the reception quality threshold again or when the noise 
value drops below the noise threshold again." 

b) Claim 2 assumes that, according to prior events described in claim 1 , that the mode of 
operation has changed during the steps outlined in claim 1 . However, a change in the 



Application/Control Number: 10/532,919 Page 15 

Art Unit: 2611 

mode of operation is not required in claim 1 . But as "and/or" is an "inclusive or" the 
steps in claim 1 can take any of the following forms in the event the failure of the 
confidence hypothesis test: (1) the speech recognition system switches over to a mode of 
operation which is less sensitive to noise and outputs an alert signal to the user; (2) the 
speech recognition system switches over to a mode of operation which is less sensitive to 
noise and doesn't output an alert signal to the user; and (3) the speech recognition system 
outputs an alert signal to the user without switching to another mode of operation. 

c) Claim 2, in its dependence on the mode of operation having switched in claim 1 

according to option (1) or (3), is indeterminate and therefore indefinite because option (2) 
may have occurred instead. 

21 . Claim 5 is rejected because it is indefinite and fails to particularly point out and distinctly 
claim the subject matter which applicant regards as the invention. The claim is indefinite due to 
the lack of antecedent basis for the claim element "the utterance". 

22. Claim 6 is rejected because it is indefinite and fails to particularly point out and distinctly 
claim the subject matter which applicant regards as the invention. The claim is indefinite due to 
the lack of antecedent basis for the claim element "the voice activity detector". 

23. Claim 6 is rejected because its meaning is indecipherable. 

a) Taken as a whole, Examiner cannot speculate as to Applicant's meaning. Examiner 
cannot ascertain what Applicant means with the use of the word "itself. There is no 
frame of reference for self-ness or other-ness that provides context for "itself. 
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b) Does Applicant intend to say that a reception quality value, a noise value or a reception 
corruption indication signal is applied to a dialog control device? Given Applicant's 
broad definitions of "reception quality value" and "noise value" the three terms are 
interchangeable. 

24. Claim 7 is rejected because it is indefinite and fails to particularly point out and distinctly 
claim the subject matter which applicant regards as the invention. The claim is indefinite due to 
the lack of antecedent basis for the claim element "the reception corruption indication signal". 

25. Claim 8 is rejected as being indefinite and failing to particularly point out and distinctly 
claim the subject matter which applicant regards as the invention. The claim is indefinite due to 
lack of antecedent basis in more than one claim element. 

a) For example, claim 8 is indefinite due to lack of antecedent basis in the phrase "an 
incoming signal is analyzed as regards the type of disturbance", which implies that the 
incoming signal to the speech recognition system always has a known disturbance of a 
known type. 

b) Claim 8 also lacks antecedent basis for the phrase "a prompt which contains this 
information is output to the user". It's indeterminate whether "this information" refers to 
the type of disturbance or the fact that the hypothesis test failed, since "this information" 
can refer back to more than one antecedent basis. 

26. Claim 8 is rejected as indefinite and failing to particularly point out and distinctly claim 
the subject matter which applicant regards as the invention. The claim is indefinite as its 
meaning is indecipherable. Claim 8 is indefinite in the use of referential prepositions in alluding 
to a technical process. 
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a) Examiner cannot ascertain what "analyzed as regards the type of disturbance" refers to, 
i.e. whether Applicant's claimed invention has the ability to remotely identify what is 
causing sound noise in the user's background or whether Applicant proposes to conduct a 
diagnostic of the communications system and provide that information to the user. 

b) Examiner cannot ascertain whether "this information" that is communicated back to the 
user refers to information about the "type of disturbance" causing the confidence 
measures hypothesis test to fail or just the fact that the test failed. 

27. Claim 9 is rejected as being indefinite and failing to particularly point out and distinctly 
claim the subject matter which applicant regards as the invention. 

a) Claim 9 is indefinite in its use of vague prepositional references and cascaded modifiers 
that hides the antecedent bases for claimed limitations. The cascading of modifying 
clauses makes it practically impossible for Examiner to determine what limitations 
definitely apply to what preceding claim element. Applicant recites the list of limitations 
more like a linked list than as an enumeration. 

b) Examiner cannot determine whether "characterized in that it comprises a quality control 
device" limits the speech recognition system, the means for detection of a speech signal, 
or the speech recognition device. 

c) It is suggested the use of outlining or indentation, that shows elements as grouped with 
their intended limitations, would help clarify the enumeration of claim 9. 

28. The scope of the limitations in the above claims rejected under 35 U.S.C. § 1 12 
Paragraph 2 are not reasonably ascertainable by one skilled in the art. 
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G. 35 USC §102 - Claim Rejection 

29. The following is a quotation from 35 U.S.C. 102: 
A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or 
in public use or on sale in this country, more than one year prior to the date of application for 
patent in the United States. 

30. Claim 1 is rejected under 35 U.S.C. 102(b) as anticipated by Audrius Polikaitis, et al, U.S. Patent 

Number 6336091 (hereinafter "Polikaitis, et al). 

a) In this rejection, the use of "and/or" is properly interpreted as "or" and the "or" option that results 
from a failure of the confidence measure hypothesis test is that the speech recognition system 
"outputs an alert signal to the user". 

b) Polikaitis teaches a method for operating a speech recognition system (Polikaitis, et al, Fig. 2 and 
Fig. 3), in which method a speech signal of a user (Polikaitis, et al, Fig. 2 and Fig. 3,215) is 
detected and analyzed so as to recognize speech information (Polikaitis, et al, Fig. 2 and Fig. 3, 
220) contained in the speech signal, characterized in that there is determined a reception quality 
value or a noise value (Polikaitis, et al, Fig. 2 and Fig. 3, variables in 230, 240, 250, 260) which 
represents a current reception quality, and in that the speech recognition system switches over to a 
mode of operation which is less sensitive to noise and/or outputs an alert signal to the user 
(Polikaitis, et al, Fig. 2 and Fig. 3, 233, 243, 253, 263) when the reception quality value drops 
below a given reception quality threshold or when the noise value exceeds a noise threshold 
(Polikaitis, et al, Fig. 2 and Fig. 3, thresholds in 230, 240, 250, 260). 

c) In the context of this claim, the "and/or" is commensurate with an "or". The "or" being satisfied 
if the speech recognition system (1) "switches over to a mode of operation which is less sensitive 
to noise" or (2) "outputs an alert signal to the user". Polikaitis, et al, teaches the invention of 
claim 1 consistent with option (2). 
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H. 35 USC §103 - Claim Rejections 

31. Claim 3 is rejected under 35 U.S.C, 103(a) as being unpatentable over Polikaitis, et al, as applied 
to claim 1 , and further in view of Nguyen, John N., USPN 5765 130 (hereinafter "Nguyen") and Crane, 
Matthew, et al, USPN 7069221 (hereinafter "Crane, et al"). 

a) Polikaitis, et al, teaches the method of claim 1 upon which claim 3 depends. Polikaitis, et al, 
don't teach that a barge-in mode of operation is disabled based on the confidence measures 
hypothesis test results. 

b) Nguyen teaches a barge-in feature for a speech recognition system in which, when the reception 
quality value drops below the reception quality threshold or the noise value exceeds the noise 
threshold (Nguyen, Col. 5 27-32), the barge-in mode of operation (Nguyen, Abstract) of the 
speech recognition system (Nguyen, Abstract) is deactivated. 

c) It would have been obvious to one of ordinary skill in the art to implement the teachings of 
Nguyen, et al, into the teachings of Polikaitis, et al, since Polikaitis suggests a speech recognition 
system that performs confidence measures hypothesis testing via thresholding on the received 
signal to provide voice prompts to the user to manage the speech recognition process to be more 
effective and since Nguyen teaches a barge-in feature that prevents users' voices and barge-in 
echo from combining to disrupt the interactive session, also threshold-based, wherein his 
invention "provide[s] a method and apparatus for implementing . . .barge in" in the analogous art 
of telephone voice recognition systems and where ". . .the invention is useful even in the absence 
of local echo cancellation, since it still provides a dynamic threshold for determination of whether 
a user signal is being input concurrent with a prompt." (Nguyen, Col 7. lines 6-9). 
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d) Crane, et al, teaches a barge-in feature for a speech recognition system in which, when the source 
of signal is determined to be a non-target barge-in, the barge-in mode of operation is deactivated 
(Crane, et al, Fig. 3 Elements 70, 72, 74). 

e) It would have been obvious to someone of ordinary skill in the art at the time the invention was 
made to implement the teachings of Crane, et al, into Polikaitis, et al, since Polikaitis, et al, 
teaches an automatic interactive speech recognition system that uses voice prompts with the user 
in combination with incoming signal conditions hypothesis testing to threshold for disruptions, 
and since Crane, et al, teaches a method of determining of whether a potential barge-in signal 
energy detected is that of a user, with thresholding of a confidence measure to enable or disable 
prompt play in a barge-in system as "in one embodiment, [wherein] recognizer 37 determines 
whether the sound received is a target or a non-target signal by obtaining a score for that signal, 
and determining whether the score exceeds a threshold for recognizing the signal as a target (or as 
a non-target) signal." (Crane, et al, "Description of Preferred Embodiments" H 18). 

32. Claim 5 is rejected under 35 U.S.C. 103(a) in light of Polikaitis, et al, as applied to claim 1 and 
further in view of S Van Gerven, and F Xie - Proc. Eurospeech, 1 997 (hereinafter "Gerven and Xie"). 

f) Polikaitis, et al, teaches the method of claim 1 upon which claim 3 depends. Polikaitis, et al, 
don't teach that the estimates of noise level are based on measuring the background signal, i.e. the 
input signal when before the user speaks or during speech pauses. 

a) Gerven and Xie teach that correct voice activity detection should include characterizing the noise 
during noise periods and characterizing the speech during speech periods. Gerven and Xie teach 
characterizing a reception quality value (Gerven and Xie, § A ^ 1 , "energy of the total signal in the 
presence of speech") or a noise value (Gerven and Xie, §A f 1 , "varying noise level"), 
determined on the basis of a background signal ( Gerven and Xie, §A | 1, "background noise") 
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which is received prior to the beginning of the utterance and/or in a speech pause of the user 
(Gerven and Xie, §B 1 1, "voice inactive segments."). 

b) It would have been obvious for one of ordinary skill in the art the time to implement the teachings 
of Gerven and Xie into the teachings of Polikaitis, et al, since Polikaitis, et al, suggests the 
benefits of measuring signal metrics associated with speech and noise against thresholds to 
reduce operational error and Gerven and Xie teach obtaining signal and noise metrics during 
speech and non-speech periods, respectively. (Gerven and Xie, page 1,14) ("Adaptive speech 
enhancement algorithms typically behave completely different during speech periods than during 
noise periods. During speech periods the algorithms should learn as much as possible about the 
speech source and during noise periods as much as possible about the noise source(s). Correct 
voice activity detection (VAD) is therefore crucial to their success."). Gerven and Xie also 
suggest that measuring the background noise during speech pauses are the "classic energy 
threshold method" of voice activity detection. (Gerven and Xie, §B 1 1). 

33. Claim 6 is rejected under 35 U.S.C. 103(a) as being obvious over Polikaitis, et al, as applied to 
claim 4 above further in light of Marx, Matthew, et al, USPN 6173266 (hereinafter "Marx, et al"). 

a) Polikaitis, et al, teaches the method of claim 1 upon which claim 3 depends. Polikaitis, et al, 
don't teach that a reception corruption signal is sent to a dialog control device. 

b) Dialog control devices existed and were widely used in the art at the time the invention was made 
for the purposes of managing voiced interactive sessions in automatic speech recognition 
systems. 

c) Marx, et al, teach a dialog control module feature for an automatic interactive speech recognition 
system (Marx, et al, Fig 4) characterized in that the voice activity detector (Marx, et al, Fig 2) 
applies the reception quality value (Marx, et al, Fig 2, 260) or the noise value (Marx, et al, Fig 2, 
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270) itself (Marx, et al, Fig 2, 250) and/or, when the reception quality value drops below the 
reception quality threshold (Marx, et al, Fig 2, 280) or when the noise value exceeds the noise 
threshold (Marx, et al, Fig 2, 280), a reception corruption indication signal (Marx, et al, Fig 2, 
215) to a dialog control device (Marx, et al, Fig 4, 430). 

d) Thus, it would have been obvious for one of ordinary skill in the art at the time the invention was 
made to implement the teachings of Marx, et al, into the teachings of Polikaitis, et al, since 
Polikaitis, et al, suggest the benefits of an interactive speech recognition system with voice 
activity detection for confidence measures hypothesis testing and use the results to mitigate errors 
due to reception conditions, and since Marx, et al, suggest the use of dialog control modules to 
manage the user interaction session in conjunction with confidence measures hypothesis testing to 
manage the interactive session. (Marx, et al, Col. 3 Line 46). 

34. Claim 7 is rejected under 35 U.S.C. 103(a) as being unpatentable over Polikaitis, et al, and Marx, 
et al, as applied to claim 1 above further in view of Vanbuskirk, et al, USPN 6505155 (hereinafter 
"Vanbuskirk, et al"). 

a) Polikaitis, et al, teach a speech recognition system with voiced error prompts to the user based on 
the results of confidence measures hypothesis testing of incoming signal features. Marx, et al, 
teaches a speech recognition system wherein a dialog control device manages the voiced 
interactive session with a user in an interactive, automated speech system. Polikaitis, et al, and 
Marx, et al, do not teach that the user is sent any particular information when hypothesis tests fail. 

b) Vanbursick, et al, teaches a method for operating speech recognition system characterized in that 
an incoming signal (Vanbuskirk, et al, Fig. 4A Element 22) is analyzed (Vanbuskirk, et al, Fig. 
4A Element 25, 33) as regards the type of disturbance causing (Vanbuskirk, et al, Fig. 4A 
Element 29) the reception quality value to be below the reception quality threshold or the noise 
value to be above the noise threshold (Vanbuskirk, et al, Fig. 4A Element 31), and that the dialog 
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control device initiates the output of a prompt to the user (Vanbuskirk, et al, Fig. 4C-G Element 
49, 44, 51,53, 57) who is thus given the information that the reception conditions are poor 
(Vanbuskirk, et al, Fig. 4C-G Element C) ." 

c) It would have been obvious to one of ordinary skill in the art at the time the invention was made 
to implement the teachings of Vanbursick, et al, into the speech recognition system taught in 
Polikaitis, et al since Polikaitis suggests that a voiced alert to the user that the voice recognition 
conditions are likely to lead to an error, while Vanbuskirk suggests his system of dynamically 
composed voice prompts to the user reflecting poor ambient noise conditions serves that purpose. 
(Vanbuskirk, et al, 15 and 21). (i.e. The invention serves to "anticipate that recognition errors 
in consequence of heightened background noise . . . [and] . . . proactively adjust feedback [to the 
user]." and "Responsive to predicted adequate recognition accuracy, the present invention could 
reduce prompt feedback in the computer responsive prompt."). 

d) Vanbursick' s invention provides for dynamically composed voiced prompts to the user based on 
confidence measure hypothesis testing of the received signals. Vanbuskirk does not teach 
explicitly that the information of the dynamically composed voice prompt to the user is that the 
reception conditions are poor when the confidence measures test fails. 

e) The sending of information about reception condition problems is already anticipated in 
Polikaitis: "Alternatively, the microprocessor may permit the speech recognition processing to 
continue with a warning that the speech recognition output may be incorrect due to the error in 
the speech signal format" wherein errors in the speech signal format include "speech energy, 
noise energy, start energy, end energy, the perceptage of clipped speech samples and other speech 
or signal related parameters within the speech acquisition window." (Polikaitis, et al, Col 2. Lines 
41 and 51). 
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35. Claim 8 is rejected under 35 U.S.C. 103(a) as being unpatentable over Polikaitis, et al, as in claim 
1 above further in view of Vanbuskirk, et al. and Steinbrenner, Kurt W., et al, USPN 67543 10 (hereinafter 
"Steinbrenner, et al"). 

a) Polikaitis, et al, teaches the method of claim 1 upon which claim 3 depends. Polikaitis, et al, does 
not teach that the incoming signal is analyzed for the type of reception problem that occurs when 
the confidence measures hypothesis testing fails, and that this information is provided to the user 
via voiced messages. 

b) Steinbrenner, et al, teach a method for operating an interactive automatic telephony system 
wherein an incoming signal (Steinbrenner, et al, Col. 2 Lines 3 1 -42) is analyzed (Steinbrenner, et 
al, Col. 6 line 64 - Col. 7 line 1) as regards the type of disturbance causing the reception quality 
value to be below the reception quality threshold or the noise value to be above the noise 
threshold (Steinbrenner, et al, Col. 6 line 67), and that a prompt (Steinbrenner, et al, Fig. 5. 
Element 80; Fig. 7, 124) which contains diagnostic information (Steinbrenner, et al, Fig. 5. 
Element 78; Fig. 7, 122) is output (Steinbrenner, et al, Fig. 5. Element 82; Fig. 7, 126) to the user. 

c) Also, further note that Applicant's preferred embodiments of his invention included telephone- 
based systems such as those in Steinbrenner, et al: "Examples of such speech dialog systems are 
automatic answering and information systems which nowadays are used in particular by some 
large companies and public services so as to offer a caller as quickly and as comfortably as 
possible with the desired information ... [fjurther examples in this respect are automatic telephone 
information systems..." (Specification, f 2) In the embodiments of Applicant's invention that 
consist of interactive telephone answering and information systems, the diagnostic information to 
be provided back to the user would necessarily be that taught in Steinbrenner, et al. 

d) It would have been obvious to a person of ordinary skill in the art at the time the invention was 
made to implement the teachings of Steinbrenner, et al, into Polikaitis, et al, since Polikaitis, et al, 
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suggests voiced alert prompts to the user under poor reception conditions while Steinbrenner, et 
al, describes the benefits of combining voiced prompts to the user containing network and device 
diagnostic information in the analogous art of interactive automatic telephony. (Steinbrenner, et 
al, Col. 3 Lines 25 - 29) 

36. Claim 9 is rejected under 35 U.S.C. 103(a) as being unpatentable over Polikaitis, et al, further in 
view of Marx, et al, and Bridges, James USPN 5978763 (hereinafter "Bridges"). 

a) Polikaitis teaches a speech recognition system (Polikaitis, et al, Fig. 2 and Fig. 3) for the detection 
of a speech signal of a user (Polikaitis, et al, Fig. 2 and Fig. 3, 215) and a speech recognition 
device (Polikaitis, et al, Fig. 2 and Fig. 3, 290) so as to recognize speech information contained in 
the speech signal, characterized in that it comprises a quality control device for determining a 
reception quality value or a noise value, (Polikaitis, et al, Fig. 2 and Fig. 3, variables in 230, 240, 
250, 260) representing a current reception quality, a comparator for comparing the reception 
quality value with a predetermined reception quality threshold or for comparing the noise value 
with a given noise threshold (Polikaitis, et al, Fig. 2 and Fig. 3, thresholds in 230, 240, 250, 260), 
and control means which are constructed in such a manner that the speech recognition system is 
switched over to a mode of operation which is less sensitive to noise and/or an alert signal is 
output to the user (Polikaitis, et al, Fig. 2 and Fig. 3, 233, 243, 253, 263) when the reception 
quality value drops below a given reception quality threshold or when the noise value exceeds a 
noise threshold. 

b) In the context of this claim, the "and/or" is commensurate with an "or". The "or" being satisfied 
if the speech recognition system (1) "switches over to a mode of operation which is less sensitive 
to noise" or (2) "outputs an alert signal to the user". Polikaitis, et al, teaches the invention of 
claim 1 consistent with option (2). 
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c) Polikaitis, et al, do not teach that a control means causes the speech recognition system to send an 
alert signal to the user. 

d) Marx, et al, teach a dialog control module that provides a control means for controlling an 
automatic interactive speech recognition session (Marx, et al, Fig 4) characterized in that the 
voice activity detector (Marx, et al, Fig 2) applies the reception quality value (Marx, et al, Fig 2, 
260) or the noise value (Marx, et al, Fig 2, 270) itself (Marx, et al, Fig 2, 250) and/or, when the 
reception quality value drops below the reception quality threshold (Marx, et al, Fig 2, 280) or 
when the noise value exceeds the noise threshold (Marx, et al, Fig 2, 280), a reception corruption 
indication signal (Marx, et al, Fig 2, 215) to a dialog comtol device (Marx, et al, Fig 4, 430). 

e) It would have been obvious for one of ordinary skill in the art at the time the invention was made 
to implement the teachings of Marx, et al, into the teachings of Polikaitis, et al, since Polikaitis, et 
al, suggest the benefits of an interactive speech recognition system with voice activity detection 
for confidence measures hypothesis testing to mitigate errors due to reception conditions, and 
since Marx, et al, suggest the use of dialog control modules to manage the user interaction session 
in conjunction with confidence measures hypothesis testing to manage the interactive session. 
(Marx, et al, Col. 3 Line 46). 

f) Polikaitis, et al, does not teach that a comparator is used for thresholding. However, a device that 
tests a value against a threshold is a kind of comparator. Thus, Applicant adds no new limitation 
in claim language where an initially narrow limitation (testing against a threshold) is followed by 
a broader limitation (use of a comparator). 

g) Embodiments in which the dependent claim's added limitation of the comparator does not imply 
a limitation that is broader than the independent claim's thresholding includes applications in 
which the thresholding is a function that is separate and apart from the comparison against a 
threshold. 
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h) Bridges teaches a voice activity detection method that uses a comparator (Bridges, Fig 2, 268) to 
compare a received signal against an adaptive threshold (Bridges, Fig. 2) in a voice activity 
detector (also Bridges, | 18). 

i) Thus, it would have been obvious to one of ordinary skill in the art at the time the invention was 
made to implement the teachings of Bridges into the teachings of Polikaitis, et al, since Polikaitis, 
et al, suggests a speech recognition system that tests received signal for quality measures against 
a threshold and Bridges suggests that the use of a "threshold comparator" improves the 
performance of the voice activity detection in the case where echo return loss interferes with 
voice prompt system performance. (Bridges, % 20) ("Controlling the threshold on the basis of the 
echo return loss measured not only reduces the number of false triggerings by the voice activity 
detector due to echo, but also reduces the number of triggerings of the voice activity detector 
when the user makes a response over a line having a high amount of echo."). 

/. Conclusion 

37. The prior art made of record and not relied upon is considered pertinent to applicant's 
disclosure. The Applicant's claims read on the prior art as described below. 

a) USPN 4905286, Nigel C. Sedgwick, et al, ("Noise compensation in speech recognition") 
(hereinafter "Sedgwick"). Sedgwick, et al, teach a system in which in which speech 
recognition process mode adapts to input signal noise, based on information processed 
from the input signal and noise. 

b) USPN 6381570, Dunling, Li, et al ("Adaptive two-threshold method for discriminating 
noise from speech in a communication signal") (hereinafter "Dunling"). Invention in 
which speech recognition processing adapts to input signal and/or noise levels, and 
information processed from them. 
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c) JP # EP 1085501 A2, Komori, Yasuhiro, ("Client-server based speech recognition"). 
Prior art on which claim 8 reads, teaching the delivery of voiced diagnostic information 
to the user in an interactive, remote automatic speech recognition system. Komori, et al, 
provide a detailed teaching of a diagnostic system for an interactive remote session-based 
speech recognition process that integrates the teachings of Steinbrenner's 
communications system automatic diagnostic features, and provides other acoustical 
features diagnostic process as well. 

d) WO 2000/072307, Drenth, Egbert; Kamperman, J; Huisman Victor, and Boves, 
Lodewijk, (hereinafter "Drenth, et al") (Speech-processing system for speech-frame- 
oriented telephone-speech link, uses control parameters relating to specific characteristic 
of signal entered from source to speech-recognition module"). If "and/or" was read as 
"and", claim 1 would still be obvious within the meaning of 35 U.S.C. § 103 in light of 
Polikaitis, et al, and further in view of Drenth, et al. Drenth, et al, teach a speech 
recognition system in which the speech recognition is dynamically controlled in advance 
of the recognition process based on presence of a speech signal and frame quality. 

e) JP # 2002244696 A, Tsunashima, Noriyuki, ("Controller by Speech Recognition"). 
Limitations in claims 1 through 9 read on Tsunashima' s controller for speech recognition 
system, in which the speech recognition is controlled in accordance with input signal 
threshold test results. 

38. In any amendment or correction of the Specification and claims as suggested, required or 
necessitated herein, no new matter may be added. 
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39. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Annette Keller whose telephone number is (571) 270-3779. The 
examiner can normally be reached on Monday - Thursday 7:30 a.m. - 6:00 p.m. EST. 

40. If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Pankaj Kumar can be reached on (571) 272-301 1 . The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

41 . Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 
like assistance from a USPTO Customer Service Representative or access to the automated 
information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Pankaj Kumar/ 

Supervisory Patent Examiner, Art Unit 261 1 



